Skip to content

fix(protocol): unify plan/message write paths through contributeOperation (#228)#237

Merged
windoliver merged 15 commits intomainfrom
worktree-optimized-toasting-perlis
Apr 9, 2026
Merged

fix(protocol): unify plan/message write paths through contributeOperation (#228)#237
windoliver merged 15 commits intomainfrom
worktree-optimized-toasting-perlis

Conversation

@windoliver
Copy link
Copy Markdown
Owner

Summary

Closes #228. Plan and message operations were calling store.put() directly, bypassing PolicyEnforcer, TopologyRouter, after_contribute hooks, and (transitively) the EnforcingContributionStore rate-limit wrapper. This PR routes them through the canonical contributeOperation write path, adds the missing rate-limit wiring on the MCP entry point, replaces the heuristic 60s dedup window with explicit idempotencyKey, and pins everything down with tests.

11 atomic commits, each independently reviewable and tests-clean.

What changed

Architecture

  • createPlanOperation / updatePlanOperation become sugars over contributeOperation (commit 5ccf818). The literal protocol: plan/message operations bypass policy enforcement and topology routing #228 bug — agent with allowedKinds: ['work'] could create plans freely — is closed.
  • sendMessage is deleted and replaced with sendMessageAsDiscussion which sugars over discussOperation/contributeOperation (commit 5535629). MCP grove_send_message, the boardroom HTTP route, the CLI inbox send command, and the e2e test all updated.
  • mcp/serve.ts now wraps the contribution store with EnforcingContributionStore (commit a520eae) — without this, maxContributionsPerAgentPerHour and clock-skew protection were dead config for any MCP-served contribution. Verified end-to-end by a wiring test that asserts the 2nd contribution from the same agent is rejected at the MCP boundary.
  • Explicit idempotencyKey field on ContributeInput (commit b483123) replaces the previous 60s same-summary heuristic that false-positived on legitimate updates and missed real retries under concurrency. Per-agent namespaced, 5min TTL, 1024-entry LRU. HTTP-style semantics.

Routing rules (locked in during the #228 review)

The kind-based skip logic in contributeOperation now applies these rules uniformly:

kind handoffs route event stop conditions
plan no yes no
ephemeral message (context.ephemeral=true) no no no
discussion (regular) yes yes yes
work / review / reproduction / adoption yes yes yes

Plans are coordination metadata (not progress) and ephemeral messages are chat (not coordination). Both opt out of handoff creation and stop-condition evaluation. Ephemeral messages additionally skip the routing event so chat doesn't wake downstream agents. Pinned by 4 routing tests in contribute-routing.test.ts.

Code quality + perf

  • Typed context schemas (PlanContext, MessageContext) in core/operations/context-schemas.ts (commit 0bf7685) replace 7 force-cast / magic-string sites in plan and messaging operations. Zod-backed parsers + builders, 21 unit tests.
  • writeAtomic / writeSerial helpers extracted from the 76-line duck-typed cowrite branch in contributeOperation (commit c6bad62).
  • /tmp/grove-debug.log scaffolding deleted (commit 4dcb5f9) — 8 inline appendFileSync blocks across contribute.ts and serve.ts from a past debugging session.
  • validateArtifacts parallelized with Promise.all (commit 744e743) — N artifacts now pay 1×rtt to CAS instead of N×rtt.
  • handoffStore.createMany batch method (commit 8969f26) — NexusHandoffStore.createMany collapses N handoff inserts into one VFS file write. Fixes the N+1 in the serial write path.
  • updatePlanOperation validates previous.kind === 'plan' (commit 5ccf818) — was previously possible to derive a 'plan update' from a work / review / discussion contribution.

Tests

Test results

374 pass / 0 fail across core/operations/, core/policy-enforcer, core/lifecycle.integration, mcp/server.integration. Two pre-existing flaky tests in tui/spawn-manager and environmental tests for AcpxRuntime / TmuxRuntime are unrelated and were verified to fail on main without these changes.

Stats

  • 11 commits
  • +2126 / -514 lines
  • 18 files changed
  • 6 new test files / files with significantly expanded coverage

Follow-up issues filed

These were deferred from the #228 review per the locked-in plan, with the workaround comments referencing the issue numbers:

Migration / breaking changes

None at the public API level. Internal callers of the deleted sendMessage (server route, CLI, e2e test) updated in the same commit. The idempotencyKey field is opt-in — existing callers see no behavior change.

Test plan

  • bun test src/core/operations/ — 283 pass
  • bun test src/core/policy-enforcer.test.ts — 51 pass
  • bun test src/core/lifecycle.integration.test.ts — 7 pass
  • bun test src/mcp/server.integration.test.ts — 10 pass (incl. new rate-limit wiring test)
  • bun run typecheck — clean
  • Manual: send a message with a contract that has allowedKinds: ['work'], verify it's rejected (regression check for protocol: plan/message operations bypass policy enforcement and topology routing #228)
  • Manual: hit MCP rate limit by issuing more grove_submit_work calls than the per-hour quota allows

…ute and serve

Deletes 8 inline appendFileSync blocks that wrote to a hardcoded
/tmp/grove-debug.log path. These were added during a past debugging
session and have outlived their purpose.

Why: hardcoded path, sync fs in async hot path, require() in ESM,
empty catch swallowing errors, DRY violation across both files,
debug observability mixed into core write path.

Issue 7A in the #228 review (preparation refactor before unifying
the write path through contributeOperation).
Pulls the duck-typed atomic-vs-serial cowrite branching out of
contributeOperation into three named module-private helpers:

- writeAtomic: SQLite cowrite path (one transaction)
- writeSerial: separate-await path (in-memory + Nexus VFS handoffs)
- writeContributionWithHandoffs: dispatch based on store capabilities

The contribution body is now a single call instead of a 76-line
inline branch with capability detection scattered across the
condition. The helpers are independently testable and the dispatch
rule lives in one place.

No behavior change. Verified by 249 core/operations tests passing.

Issue 8B in the #228 review (preparation refactor before unifying
the write path through contributeOperation).
…ecks

Replaces the sequential await loop in validateArtifacts with a single
Promise.all over the artifact entries. A contribution with N artifacts
now pays 1×rtt to the CAS instead of N×rtt — meaningful when CAS is
remote (Nexus) and harmless when local.

The asymmetry was the smoking gun: validateRelations already uses
getMany() for batched lookup. Someone fixed the relation N+1 and
forgot the artifact one. Read-only existence checks have no
atomicity concerns, so Promise.all is the simplest correct fix.

Issue 16B in the #228 review.
…nd message

Adds context-schemas.ts with PlanContext and MessageContext as the
single source of truth for the magic context keys used by plan and
messaging operations.

- PlanContext: { plan_title, tasks }
- MessageContext: { ephemeral, recipients, message_body }

Each schema has a builder (writer side) and a Zod-backed parser
(reader side). Plan and messaging operations now use buildPlanContext
/ buildMessageContext on the write path, and parsePlanContext /
parseMessageContext on the read path. The PlanTask type itself moves
to context-schemas.ts (re-exported from plan.ts for compatibility).

Removes:
- 'as unknown as JsonValue' double-cast on plan tasks (2 sites)
- '(c.context?.plan_title as string)' force-cast in updatePlan
- '(c.context?.recipients as string[])' force-casts in readInbox (3)
- '(c.context?.message_body as string)' force-cast in contributionToMessage
- Manual ephemeral/recipients shape check in readInbox

readInbox now pairs each contribution with its parsed message context
once, then filters/sorts/maps over the typed pair. Wrong-shape
contributions are dropped exactly once at parse time, removing the
ad-hoc 'is this even a message?' check.

Includes 21 unit tests covering the schemas and predicates.

Issue 5A in the #228 review (preparation for the unified write path).
Replaces the previous 60s same-summary dedup heuristic with an
explicit, opt-in idempotency key on ContributeInput. Modeled after
the HTTP Idempotency-Key convention used by Stripe, AWS, and most
modern HTTP APIs.

Why the heuristic was wrong:
- store.list({ limit: 20 }) was unbounded across all agents — under
  multi-agent load, an agent's own recent contributions could be
  pushed out of the window by other agents' writes, missing real
  retries.
- summary + agent + kind false-positived on legitimate re-edits
  (e.g., updatePlan called twice with the same title would silently
  drop the second call).
- The whole thing was a guess at intent. Idempotency keys make the
  contract explicit.

Behavior:
- Opt-in: callers that don't pass idempotencyKey get no dedup.
- Per-agent namespace: keys are scoped by agent.role (or agentId
  fallback). Two agents can use the same key without colliding.
- 5-minute TTL with LRU eviction at 1024 entries. Per-process cache,
  not shared across grove instances.
- The full ContributeResult is cached and returned on hit, including
  routing/handoff/policy fields.

Includes 5 new tests covering: round-trip cache hit, distinct keys,
cross-agent isolation, role-vs-agentId scoping, no-key default.

Issue 4A in the #228 review.

Note: MCP tools currently don't pass idempotencyKey. Adding support
to grove_submit_work / grove_submit_review / etc is a follow-up;
existing MCP retry behavior is unchanged from the previous heuristic
removal (callers must dedup themselves until then).
Fixes #228 for the plan path: createPlanOperation and updatePlanOperation
no longer call store.put() directly. They now sugar over contributeOperation
the same way reviewOperation, reproduceOperation, discussOperation, and
adoptOperation already do, so plans flow through the canonical write
pipeline:

  - PolicyEnforcer (role-kind constraints)
  - validateRelations / validateArtifacts
  - Topology routing event
  - after_contribute hook
  - idempotencyKey dedup

The literal #228 bug — an agent with allowedKinds=["work"] could create
plans freely — is now closed.

Routing rules for the plan kind (locked in during the #228 review):

  kind | handoffs | route event | stop conditions
  plan |    no    |     yes     |       no

Plans are coordination metadata, not progress. They:
- Skip handoff record creation (handoffsRoutedTo = undefined when isPlan)
- Still fire the topology routing event (so live UIs can observe creation)
- Skip stop-condition evaluation (no full-table scans for chat-equivalent
  traffic; no broadcastStop)

Same skip rules apply to ephemeral messages (kind=discussion +
context.ephemeral=true), detected via isEphemeralMessageContext from
the new context-schemas module. The rule table for ephemeral messages:

  kind          | handoffs | route event | stop conditions
  ephemeral msg |    no    |     no      |       no

(Messages skip the route event entirely too — chat shouldn't wake
downstream agents.)

Implementation (Issues 1A + 13A + 6A landing together because the
gating logic is one cohesive change):

- contribute.ts: classifies kind/context up front into skipHandoffs /
  skipRouteEvent / skipStopConditions and threads them through the
  enforcement, write, and post-write blocks.
- policy-enforcer.ts: enforce() takes an options.skipStopConditions
  flag that bypasses the stop-condition evaluation block (the O(n)
  scan plus per-root thread() calls).
- plan.ts: createPlanOperation and updatePlanOperation become thin
  sugars over contributeOperation. updatePlanOperation now also
  validates that previous.kind === 'plan' (Issue 6A), preventing
  cross-kind derives_from corruption.
- Both plan operations expose an idempotencyKey passthrough so MCP
  callers can opt into retry safety.

Issues 1A, 13A, 6A in the #228 review.
…sugar

Closes #228 for the messaging path. The previous sendMessage took a raw
ContributionStore and a computeCid callback, open-coded the contribution
construction, and called store.put() directly — bypassing PolicyEnforcer,
TopologyRouter, after_contribute hook, and validateRelations.

The new sendMessageAsDiscussion is a thin sugar over contributeOperation
that goes through the canonical write pipeline. Behavior changes:

- Role-kind constraints now apply (the literal #228 bug for messages).
  An agent with allowedKinds=['work'] can no longer send messages.
- Idempotency keys work for messages (HTTP-style retry safety).
- Reply targets are validated via validateRelations — sending a reply
  to a non-existent CID returns NOT_FOUND instead of writing an orphan.
- Returns OperationResult instead of throwing/returning bare Contribution,
  matching the operation pattern used everywhere else in core/operations.

Routing rules for ephemeral messages (set in contributeOperation via
isEphemeralMessageContext):

  kind          | handoffs | route event | stop conditions
  ephemeral msg |    no    |     no      |       no

Chat does not wake downstream agents and does not contribute to
stop-condition evaluation.

Caller updates (3C in the #228 review):

- src/mcp/tools/messaging.ts: grove_send_message uses
  sendMessageAsDiscussion + the standard toMcpResult adapter.
- src/server/routes/boardroom.ts: POST /api/boardroom/message uses
  the new helper with toOperationDeps adapter, returns 400 on
  validation errors.
- src/cli/commands/inbox.ts: grove inbox send uses the new helper
  with a small CliDeps→OperationDeps adapter inline.
- tests/boardroom/e2e-workflow.test.ts: rewrites the message send
  steps against the new signature.

Test rewrite (10A in the #228 review):

- src/core/operations/messaging.test.ts: full rewrite. Adds:
  - #228 regression test: contract with allowedKinds=['work']
    blocks sendMessageAsDiscussion (this test would have failed
    against the old implementation; now passes).
  - inReplyTo target validation (NOT_FOUND for missing target)
  - multi-recipient round-trip
  - idempotencyKey collapse
  - empty inbox + non-message discussion exclusion
  - multi-recipient OR semantics in readInbox
  - limit cap enforcement
  17 tests total covering happy path + 11 edge cases.

Issues 3C + 10A in the #228 review.
…serve

The MCP entry point (src/mcp/serve.ts) was wiring the raw contribution
store directly into deps, bypassing rate limits / clock-skew protection
configured in GROVE.md. Compare to src/cli/commands/contribute.ts:353
and src/cli/commands/discuss.ts:127 which already wrap correctly.

The bypass meant maxContributionsPerAgentPerHour /
maxContributionsPerGrovePerHour / maxArtifactSizeBytes were dead config
for any agent talking to grove via MCP — including the agent retry
storms and runaway-loop scenarios these limits exist to prevent.

This is broader than the literal #228 ('plan/message bypass policy')
but in the same family: a contribution operation reaching the store
without going through the configured policy layer. Bundled with the
#228 PR because fixing #228 without this leaves the 'unified pipeline'
claim half-true.

Wiring test (Issue 11A): server.integration.test.ts now exercises
the wrap end-to-end. It builds a contract with
maxContributionsPerAgentPerHour: 1, mirrors serve.ts's wrap logic,
and asserts that the second grove_submit_work call from the same
agent is rejected with a rate-limit error at the MCP boundary. This
test would have failed against the previous serve.ts.

Issues 2A + 11A in the #228 review.
Before this commit, plan.ts had ZERO unit tests. The MCP integration
test only verified the tool was registered by name. After Issue 1A
landed and made plans flow through contributeOperation, the lack of
tests was a real risk: a refactor of contributeOperation could
silently break plan semantics with no safety net.

19 tests covering:

createPlanOperation:
- Happy path with stats computation (todo / in_progress / done / blocked)
- kind=plan + mode=exploration + 'plan' tag stored correctly
- Typed PlanContext via buildPlanContext (plan_title + tasks fields)
- User tag preservation
- Empty title rejection
- Empty task list rejection
- Mixed-status stats counting
- #228 regression: contract with allowedKinds=['work'] blocks plan creation
- idempotencyKey collapses repeated calls

updatePlanOperation:
- v2 with derives_from relation to v1
- Title falls through from previous when omitted
- Explicit title override
- NOT_FOUND when previous CID is missing
- Issue 6A: rejects update when previous CID is not a plan kind
- Empty task list rejection
- 3-version derives_from chain with title fallthrough

Routing semantics (Issues 1A + 13A):
- Plans do NOT generate handoffs even when topology routes the role
- Plans bypass stop-condition evaluation (write succeeds even with
  maxRoundsWithoutImprovement=0 which would normally block)

Issue 9A in the #228 review.
Adds 4 routing tests that codify the per-kind side-effect rules locked
in during the #228 review. These tests prevent Issues 1A + 13A from
becoming silent behavior changes — anyone refactoring the kind-based
skip logic in contribute.ts now has explicit failing tests if they
violate the contract.

Routing rule table:

  kind            | handoffs | route event | stop conditions
  ----------------|----------|-------------|----------------
  plan            |    no    |     yes     |       no
  ephemeral msg   |    no    |     no      |       no
  discussion      |   yes    |     yes     |      yes
  work / review   |   yes    |     yes     |      yes

New tests:

1. plan kind fires routing event but creates no handoff
   - Spy handoff store, verify no create() calls for kind=plan
   - Verify routing event still received by downstream role subscriber

2. ephemeral message kind skips both routing event AND handoffs
   - Discussion with context.ephemeral=true: no handoff, no event

3. non-ephemeral discussion routes normally
   - Discussion without ephemeral flag still creates handoff + event
   - Confirms no regression for the regular discussOperation path

4. plan does not trigger broadcastStop (Issue 13A)
   - Pre-populate store + budget=1 stop condition (would normally fire)
   - Plan write succeeds without triggering broadcastStop because
     plans skip stop-condition evaluation entirely

Issue 12A in the #228 review.
Adds an optional createMany() method to the HandoffStore interface and
uses it from the contributeOperation serial write path. NexusHandoffStore
implements it as a single VFS file write (one casUpdate, one HTTP
round-trip), and create() now delegates to createMany([input]) so the
two paths share one code path.

Before this change, contributeOperation's serial path looped over each
routing target awaiting handoffStore.create() sequentially:

  for (const targetRole of routedTo) {
    await handoffStore.create({ ...sourceCid, fromRole, toRole })
  }

For a contribution routed to N downstream roles via Nexus, that was
N×rtt against the handoff store. The atomic SQLite cowrite path didn't
have this problem (it writes everything in one transaction), but Nexus
is the configured production store in src/mcp/serve.ts.

Now writeSerial calls handoffStore.createMany(inputs) when available
and falls back to the create() loop otherwise. NexusHandoffStore's
createMany() collapses N inserts into one casUpdate cycle.

The HandoffStore interface change is backwards-compatible: createMany
is optional, so existing implementations (in-memory, SQLite) keep
working unchanged. SQLite-backed stores already use the atomic cowrite
path so they don't need the batch method.

Verified by 374 tests across 23 files (operations, policy-enforcer,
lifecycle integration, MCP integration).

Issue 15A in the #228 review.
grove_done writes a kind=discussion contribution to mark session
termination. Before this fix, that discussion was treated like any
other (non-ephemeral) discussion by contributeOperation — topology
routing fired and a handoff record was created for the downstream role.

Discovered during #228 E2E validation. A real review loop produced:

  handoffId     sourceCid       from→to          status
  9152010f...   work            coder→reviewer   replied
  ca1ef83d...   review          reviewer→coder   pending_pickup
  92cf7e7b...   done-discussion reviewer→coder   pending_pickup  ← wart

The ca1ef83d handoff is correct (review feedback routes back to the
coder, who may iterate). The 92cf7e7b handoff is semantic nonsense:
the session is ending, so there's nothing for the "downstream" agent
to pick up. The handoff sits in pending_pickup forever.

Fix: one-line change in src/mcp/tools/done.ts — set
context.ephemeral=true on the done discussion. This routes it through
the same skip path as ephemeral messages in contribute.ts
(isEphemeralMessageContext check), suppressing both the handoff
creation and the routing event. The [DONE] marker still lands in the
DAG as a discussion contribution (so grove_log and the TUI can show
it), but it no longer creates coordination records.

Related:
- Updates isEphemeralMessageContext doc in context-schemas.ts to
  clarify it now covers two shapes: chat messages and done markers.
  The function name stays historical.
- Adds a targeted routing test in contribute-routing.test.ts that
  exercises the exact shape done.ts writes, pinning the behavior
  against future regressions.

Bundled into the #228 PR because the wart was discovered during the
#228 E2E validation and is closely related to the routing rules the
PR establishes.
Codex adversarial review of PR #237 flagged three high-severity issues
that the existing test suite wasn't catching. All three are fixed here.

Finding 1 — Any caller can mark real work as ephemeral and bypass
routing/handoffs/stop checks (contribute.ts)

  isEphemeralMessageContext() returned true for any context.ephemeral=true
  regardless of kind. A caller passing { kind: "work", context:
  { ephemeral: true } } to grove_submit_work would skip the entire
  side-effect pipeline AND be filtered out by the frontier calculator,
  making real progress invisible to downstream reviewers, budget, and
  quorum logic.

  Fix: reject the combination at the boundary with VALIDATION_ERROR,
  and gate the skip path on kind === "discussion". The ephemeral flag
  is now strictly reserved for chat messages and grove_done markers.

  Tests:
  - 'ephemeral flag on non-discussion kind is rejected' — iterates
    over work/review/reproduction/adoption, asserts each is rejected
  - 'ephemeral flag on discussion is allowed' — positive control

Finding 2 — Idempotency cache was not single-flight or
fingerprint-validated (contribute.ts)

  Two flaws:
  a) Two concurrent retries with the same key could both miss the
     cache (populated post-write) and create duplicate writes.
  b) Reusing the same key with different body silently returned the
     first call's result, hiding client bugs.

  Fix: rewrite the cache around CachedIdempotencyEntry with
  { fingerprint, pending?, value? }. Three new helpers:

    - computeIdempotencyFingerprint: canonical hash of
      kind/mode/summary/description/artifacts/relations/tags +
      agentScope (role ?? agentId, matching the cache-key namespace)
    - lookupIdempotency: returns pending | value | conflict | undefined
    - reserveIdempotencySlot: synchronously inserts a pending Promise
      and returns a resolver. Check-then-insert is synchronous
      (single-threaded JS) so concurrent callers cannot both race
      past a miss.

  contributeOperation now:
    1. On hit (pending): awaits the in-flight promise
    2. On hit (value): returns cached result
    3. On fingerprint conflict: returns STATE_CONFLICT
    4. On miss: reserves a slot, runs write, resolves on success
       (cached) or validation-error (cached), releases on thrown
       error (retries can proceed)

  Tests updated + added:
  - 'repeated call with same key + same input returns cached result'
  - 'same key + different input is rejected with STATE_CONFLICT'
  - 'concurrent calls with same key are single-flight (one write)'
    — runs two calls via Promise.all, verifies only one contribution
    in the store
  - 'agent role scopes idempotency: identical payload from two coders
    shares cache' — multi-instance role with identical fingerprint
  - 'role scope: same key + different summary is STATE_CONFLICT'
  - messaging.test.ts + plan.test.ts: matching rename + new conflict
    test

Finding 3 — Wrapping MCP store dropped atomic cowrite to best-effort
(enforcing-store.ts, contribute.ts)

  Commit a520eae wrapped the contribution store with
  EnforcingContributionStore in mcp/serve.ts. But the wrapper didn't
  forward putWithCowrite, so contributeOperation's capability check
  couldn't detect the atomic path — local MCP sessions with a contract
  silently lost transactional contribution+handoff writes.

  Fix: EnforcingContributionStore gains an async putWithCowrite
  method that acquires the shared mutex, runs rate-limit enforcement,
  runs the per-CID preWriteHook, then delegates to inner.putWithCowrite
  (sync, runs inside the inner SQLite transaction).

  writeAtomic in contribute.ts now accepts either a sync or async
  putWithCowrite and awaits the return if it's a Promise.

  Tests:
  - 'exposes putWithCowrite when inner store supports it'
  - 'putWithCowrite runs enforcement then delegates to inner cowrite'
    — observable cowriteRan flag + stored contribution
  - 'putWithCowrite still enforces rate limits' — RateLimitError on
    second write from same agent with max=1

Test results: 371 pass / 0 fail across src/core/operations and
enforcing-store. 1281 pass / 2 fail across all of src/core + MCP
integration; the 2 failures are pre-existing environmental tests
(AcpxRuntime/TmuxRuntime) unrelated to these fixes.

Original findings: codex adversarial review on PR #237 — 2026-04-09
Round 2 of adversarial review on PR #237. Three high-severity issues,
all fixed with targeted regression tests.

Finding 1 — Idempotency fingerprint ignores context, scores, and
artifact names (contribute.ts)

  computeIdempotencyFingerprint() hashed kind/mode/summary/description,
  but explicitly omitted context (where plans store their task list,
  messages store body+recipients), scores (per-metric payloads), and
  artifact NAMES (only hashes). A caller reusing the same key with
  different plan tasks, different scores, or a renamed artifact would
  hit the cached success path instead of STATE_CONFLICT — silently
  acknowledging the second request while the stored contribution
  still reflects the first.

  Fix: expand the fingerprint to include context, scores, and
  name→hash artifact pairs. Add canonicalizeForFingerprint() which
  deep-sorts object keys recursively so { a: 1, b: 2 } and
  { b: 2, a: 1 } hash identically (object-key insertion order is
  not semantic).

  Regression tests:
  - 'fingerprint rejects same key + different context (plan tasks)'
    — two plans with different task statuses, second must reject
  - 'fingerprint rejects same key + different scores' — metric
    value change under same key must reject
  - 'fingerprint rejects same key + renamed artifact (same hash)'
    — same blob, different filename, must reject
  - 'fingerprint is insensitive to context key order' — deep key
    canonicalization test, nested objects

Finding 2 — Post-commit failure clears the idempotency slot and
allows duplicate writes on retry (contribute.ts)

  The slot was only resolved at the END of contributeOperation, after
  all post-write side effects (onContributionWrite callback,
  persistOutcome, topology event, stop recheck, after_contribute hook).
  If any of those threw, the outer catch released the slot — even
  though the contribution was ALREADY durably written. A retry with
  the same idempotencyKey would then execute a fresh write path with
  a new createdAt, producing a duplicate contribution + handoff fan-out.

  Fix: treat the durable contribution write as the idempotent boundary.
  Build a 'committedResult' immediately after
  writeContributionWithHandoffs() returns, resolve the slot with it,
  then CLEAR the local idempotencySlot reference so the outer catch
  cannot release it. Wrap onContributionWrite/onContributionWritten
  callbacks and persistOutcome in try/catch — they log warnings but
  cannot propagate and cannot undo the commit from the caller's
  perspective.

  The direct first caller still receives a 'final' result at the end
  of the function that may include post-write updates (e.g.,
  stop-condition recheck detecting a threshold crossing). Cached
  retries get the narrower committed-only snapshot, which is correct
  — the contribution is the same, only the advisory stop signal
  differs, and cached retries can't observe the updated state anyway.

  Regression test:
  - 'post-commit callback failure does NOT release the idempotency
    slot' — injects a throwing onContributionWritten callback, runs
    the same request twice with the same key, asserts both return
    the same cid AND that exactly ONE contribution exists in the
    store.

Finding 3 — grove inbox send still bypasses contract enforcement
and topology routing (cli/commands/inbox.ts)

  The previous fix (PR #237 commit 5535629) routed inbox send through
  sendMessageAsDiscussion(), but the CLI handler built OperationDeps
  via cliDepsToOperationDeps() which carried only raw store, claim
  store, cas, frontier, outcome store — NO contract, NO handoff
  store, NO topology router. Result: an agent could use `grove inbox
  send` to create a discussion contribution even when GROVE.md
  restricted it via agent_constraints.allowed_kinds, bypassing the
  same check MCP-routed messages now respect. The CLI was the
  remaining backdoor.

  Fix: rewrite handleSend to mirror the `grove discuss` bootstrap:
  load GROVE.md, parse the contract, wrap the contribution store
  with EnforcingContributionStore when a contract exists, include
  the handoff store + contract in the OperationDeps passed to
  sendMessageAsDiscussion. Same pattern as discuss.ts:96-155.

  Regression tests (new src/cli/commands/inbox.test.ts):
  - 'succeeds when GROVE.md does not exist' — no contract, message
    goes through (same policy as discuss)
  - 'succeeds when contract allows discussion kind' — positive
    control, verifies the stored contribution
  - 'rejects when contract restricts allowed_kinds to ['work']'
    — the literal Codex finding. Runs handleInbox with a GROVE.md
    that blocks discussions, asserts non-zero exit, stderr contains
    the role_kind message, and NO discussion contribution landed
    in the DAG.

Test results: 389 pass / 0 fail across src/core/operations,
src/cli/commands/inbox.test.ts, src/core/enforcing-store.test.ts,
src/mcp/server.integration.test.ts.

Original findings: codex adversarial review round 2 on PR #237
Round 3 of adversarial review on PR #237. Two more high-severity
issues fixed with regression tests.

Finding 1 — grove inbox send silently disables enforcement on
malformed GROVE.md (cli/commands/inbox.ts)

  The previous fix wrapped readFile() and parseGroveContract() in
  one broad try/catch. A YAML parse error, a schema validation
  failure, even a permission-denied error were all treated the same
  as ENOENT → contract stayed undefined → EnforcingContributionStore
  was never applied → CLI sends silently bypassed allowed_kinds,
  rate limits, and clock-skew checks.

  This reopened the exact bypass the round-2 enforcement fix was
  supposed to close. A broken GROVE.md could make the CLI send
  path quietly unenforced with no operator-visible signal.

  Fix: separate readFile from parseGroveContract. Only catch ENOENT
  (file-does-not-exist → proceed without enforcement, consistent
  with grove discuss). Any other file error re-throws. Parse errors
  are completely outside the catch so they propagate to the caller.

  Regression test: 'fails closed when GROVE.md is malformed (YAML
  parse error)' writes a GROVE.md with an unclosed bracket and
  asserts the command throws AND no discussion contribution lands
  in the DAG.

Finding 2 — grove_done no longer emits any completion signal to
event-driven clients (core/operations/contribute.ts)

  Round-2 fix made grove_done write context.ephemeral=true so the
  done marker would skip handoff creation. But that also put done
  markers on the 'ephemeral chat' skip path in contributeOperation,
  suppressing the topology route event. Result: event-bus clients
  like TUI's useDoneDetection() never observed session completion
  because it disables polling whenever an EventBus is present and
  waits exclusively for contribution events on the bus. All agents
  could call grove_done and the session would stay stuck in
  'running' forever.

  Fix: split the classification so done markers are asymmetric from
  chat. New routing table:

    kind                 | handoffs | route event | stop conditions
    plan                 |    no    |     yes     |       no
    discussion (done)    |    no    |     yes     |       no
    discussion (chat)    |    no    |     no      |       no
    discussion (plain)   |    yes   |     yes     |       yes
    work / review / etc  |    yes   |     yes     |       yes

  Implementation: contribute.ts now computes isDoneMarker
  (kind=discussion + context.done=true) and isEphemeralChat
  (kind=discussion + isEphemeralMessageContext && !isDoneMarker)
  separately. skipHandoffs covers both; skipRouteEvent covers only
  isEphemeralChat. Done markers still publish through the topology
  router so useDoneDetection advances.

  Regression tests:
  - 'grove_done discussion (done=true) skips handoff but fires
    route event' — writes the exact shape done.ts emits, asserts
    handoff count = 0 AND route event received with the [DONE]
    summary.
  - 'ephemeral chat (no done flag) skips both handoff AND route
    event' — pins the ephemeral-chat branch explicitly so future
    refactors of the classification logic can't collapse the two
    rows back together.

Also updated the pre-existing 'grove_done discussion' test that
previously asserted no route event — it was wrong, now asserts the
correct new behavior.

Test results: 391 pass / 0 fail across all touched suites.

Original findings: codex adversarial review round 3 on PR #237
@windoliver windoliver merged commit 772290d into main Apr 9, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

protocol: plan/message operations bypass policy enforcement and topology routing

1 participant